Goto

Collaborating Authors

 best performance


SupplementaryforEmergenceofShapeBiasin ConvolutionalNeuralNetworksthroughActivation Sparsity 1 FurtherResultsoftheimpactofsparsityonShapeBiasBenchmark

Neural Information Processing Systems

We utilize the sparsity operation proposed in Section 3.1 for ResNet-50. We generalize section 4.2 in the main text to ResNet-50 and ViT-B architectures (Figure 1). We apply the Sparsity layer in a subset of the network. It is based on the intuition that the brain utilizes sparsity for long range communication butcan allowlocal dense computation. Wedivide thenetworks into chunks where within each chunk theneuron'sactivities areallowed tobedense (keep original) but the communication across different chunks is set to be sparse.


Appendix A Theory

Neural Information Processing Systems

In this section, we show the proofs of the results in the main body. Eq. (1) satisfies the triangle inequality, i.e., for any scoring functions For the second inequality, we prove it similarly. Before we present the proof of the theorem, we first provide some lemmas. By applying Lemma A.2, the following holds with probability at least 1 α: null R F). Thus we have: null R A.1, we can get that the margin loss satisfies the triangle inequality. By Lemma A.4, we have R By Theorem 4.4, the following holds for any Based on Theorem A.6, the following standard error bound for gradual AST can be derived similarly to Corollary 4.6.



Supplementary Material

Neural Information Processing Systems

The color has been normalized to be between 0 and 1 which does not affect the clustering or visualization. We can see that output representation from later layers yield more patterned kernel matrices with more erratic clustering.